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Abstract — The goal of this paper is to characterize the best 
achievable performance for the problem of estimating an un- 
known parameter having a sparse representation. Specifically, 
we consider the setting in which a sparsely representable deter- 
ministic parameter vector is to be estimated from measurements 
corrupted by Gaussian noise, and derive a lower bound on the 
mean-squared error (MSE) achievable in this setting. To this end, 
an appropriate definition of bias in the sparse setting is developed, 
and the constrained Cramer-Rao bound (CRB) is obtained. This 
bound is shown to equal the CRB of an estimator with knowledge 
of the support set, for almost all feasible parameter values. 
Consequently, in the unbiased case, our bound is identical to the 
MSE of the oracle estimator. Combined with the fact that the 
CRB is achieved at high signal-to-noise ratios by the maximum 
likelihood technique, our result provides a new interpretation 
for the common practice of using the oracle estimator as a gold 
standard against which practical approaches are compared. 

EDICS Topics: SSP-PARE, SSP-PERF. 
Index terms: Constrained estimation, Cramer-Rao bound, 
sparse estimation. 

I. Introduction 

The problem of estimating a sparse unknown parameter 
vector from noisy measurements has been analyzed intensively 
in the past few years [l]-[4], and has already given rise 
to numerous successful signal processing algorithms [5]- 
[9]. In this paper, we consider the setting in which noisy 
measurements of a deterministic vector Xq are available. It is 
assumed that Xq has a sparse representation Xq = Dao, where 
£> is a given dictionary and most of the entries of ap equal 
zero. Thus, only a small number of "atoms," or columns of 
D, are required to represent Xq. The challenges confronting an 
estimation technique are to recover either Xq itself or its sparse 
representation cxq. Several practical approaches turn out to be 
surprisingly successful in this task. Such approaches include 
the Dantzig selector (DS) [4] and basis pursuit denoising 
(BPDN), which is also referred to as the Lasso [1], [2], [10]. 

A standard measure of estimator performance is the mean- 
squared error (MSE). Several recent papers analyzed the MSE 
obtained by methods such as the DS and BPDN [4], [11]. To 
determine the quality of estimation approaches, it is of interest 
to compare their achievements with theoretical performance 
limits: if existing methods approach the performance bound, 
then they are nearly optimal and further improvements in the 
current setting are impossible. This motivates the development 
of lower bounds on the MSE of estimators in the sparse setting. 

Since the parameter to be estimated is deterministic, the 
MSE is in general a function of the parameter value. While 
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there are lower bounds on the worst-case achievable MSE 
among all possible parameter values [12, §7.4], the actual per- 
formance for a specific value, or even for most values, might 
be substantially lower. Our goal is therefore to characterize 
the minimum MSE obtainable for each particular parameter 
vector. A standard method of achieving this objective is the 
Cramer-Rao bound (CRB) [13], [14]. 

The fact that Xq has a sparse representation is of central 
importance for estimator design. Indeed, many sparse estima- 
tion settings are underdetermined, meaning that without the 
assumption of sparsity, it is impossible to identify the correct 
parameter from its measurements, even without noise. In this 
paper, we treat the sparsity assumption as a deterministic 
prior constraint on the parameter. Specifically, we assume that 
Xq E S, where S is the set of all parameter vectors which can 
be represented by no more than s atoms, for a given integer 
s. 

Our results are inspired by the well-studied theory of the 
constrained CRB [15]-[18]. This theory is based on the 
assumption that the constraint set can be defined using the 
system of equations f{x) ~ 0, g{x) < 0, where / and g 
are continuously differentiable functions. The resulting bound 
depends on the derivatives of the function /. However, sparsity 
constraints cannot be written in this form. This necessitates the 
development of a bound suitable for non-smooth constraint 
sets [19]. In obtaining this modified bound, we also provide 
new insight into the meaning of the general constrained CRB. 
In particular, we show that the fact that the constrained CRB is 
lower than the unconstrained bound results from an expansion 
of the class of estimators under consideration. 

With the aforementioned theoretical tools at hand, we obtain 
lower bounds on the MSE in a variety of sparse estimation 
problems. Our bound limits the MSE achievable by any esti- 
mator having a pre-specified bias function, for each parameter 
value. Particular emphasis is given to the unbiased case; the 
reason for this preference is twofold: First, when the signal- 
to-noise ratio (SNR) is high, biased estimation is suboptimal. 
Second, for high SNR values, the unbiased CRB is achieved 
by the maximum likelihood (ML) estimator 

While the obtained bounds differ depending on the exact 
problem definition, in general terms and for unbiased estima- 
tion the bounds can be described as follows. For parameters 
having maximal support, i.e., parameters whose representation 
requires the maximum allowed number s of atoms, the lower 
bound equals the MSE of the "oracle estimator" which knows 
the locations (but not the values) of the nonzero representation 
elements. On the other hand, for parameters which do not 
have maximal support (a set which has Lebesgue measure 
zero in S), our lower bound is identical to the CRB for an 
unconstrained problem, which is substantially higher than the 
oracle MSE. 
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The correspondence between the CRB and the MSE of 
the oracle estimator (for all but a zero-measure subset of the 
feasible parameter set S) is of practical interest since, unlike 
the oracle estimator, the CRB is achieved by the ML estimator 
at high SNR. Our bound can thus be viewed as an alternative 
justification for the common use of the oracle estimator as 
a baseline against which practical algorithms are compared. 
This gives further merit to recent results, which demonstrate 
that BPDN and the DS both achieve near-oracle performance 
[4], [11]. However, the existence of parameters for which 
the bound is much higher indicates that oracular performance 
cannot be attained for all parameter values, at least using 
unbiased techniques. Indeed, as we will show, in many sparse 
estimation scenarios, one cannot construct any estimator which 
is unbiased for all sparsely representable parameters. 

Our contribution is related to, but distinct from, the work of 
Babadi et al. [20], in which the CRB of the oracle estimator 
was derived (and shown to equal the aforementioned oracle 
MSE). Our goal in this work is to obtain a lower bound on 
the performance of estimators which are not endowed with 
oracular knowledge; consequently, as explained above, for 
some parameter values the obtained CRB will be higher than 
the oracle MSE. It was further shown in [20] that when the 
measurements consist of Gaussian random mixtures of the 
parameter vector, there exists an estimator which achieves 
the oracle CRB at high SNR; this is shown to hold on 
average over realizations of the measurement mixtures. The 
present contribution strengthens this result by showing that 
for any given (deterministic) well-behaved measurement setup, 
there exists a technique (namely, the ML estimator) achieving 
the CRB at high SNR. Thus, convergence to the CRB is 
guaranteed for all measurement settings, and not merely when 
averaging over an ensemble of such settings. 

The rest of this paper is organized as follows. In Section II, 
we review the sparse setting as a constrained estimation prob- 
lem. Section III defines a generalization of sparsity constraints, 
which we refer to as locally balanced constraint sets; the 
CRB is then derived in this general setting. In Section IV, 
our general results are applied back to some specific sparse 
estimation problems. In Section V, the CRB is compared to 
the empirical performance of estimators of sparse vectors. Our 
conclusions are summarized in Section VI. 

Throughout the paper, boldface lowercase letters v denote 
vectors while boldface uppercase letters M denote matrices. 
Given a vector function / : M" M''', we denote by df /dx 
the k X n matrix whose ijth element is dfi /dxj . The support 
of a vector, denoted supp(t;), is the set of indices of the 
nonzero entries in v. The Euclidean norm of a vector v is 
denoted ||t'||2, and the number of nonzero entries in v is 
|lt!|lo. Finally, the symbols 7^(M), Af{M), and refer, 
respectively, to the column space, null space, and Moore- 
Penrose pseudoinverse of the matrix M. 

II. Sparse Estimation Problems 

In this section, we describe several estimation problems 
whose common theme is that the unknown parameter has 
a sparse representation with respect to a known dictionary. 



We then review some standard techniques used to recover 
the unknown parameter in these problems. In Section V we 
will compare these methods with the performance bounds we 
develop. 

A. The Sparse Setting 

Suppose we observe a measurement vector y G M'", given 

by 

y = AxQ + w (1) 

where a;o £ is an unknown deterministic signal, w is 
independent, identically distributed (IID) Gaussian noise with 
zero mean and variance ct^, and A is a known mx n matrix. 
We assume the prior knowledge that there exists a sparse 
representation of Xq, or, more precisely, that 

xo e S = {x em.'" : X = Da,\\a\\o < s} . (2) 

In other words, the set S describes signals x which can be 
formed from a linear combination of no more than s columns, 
or atoms, from D. The dictionary JD is an n x p matrix with 
n < p, and we assume that s < p, so that only a subset of 
the atoms in D can be used to represent any signal in S. We 
further assume that D and s are known. 

Quite a few important signal recovery applications can be 
formulated using the setting described above. For example, 
if A = J, then y consists of noisy observations of Xq, and 
recovering Xq is a denoising problem [5], [6]. If A corresponds 
to a blurring kernel, we obtain a deblurring problem [7]. In 
both cases, the matrix A is square and invertible. Interpolation 
and inpainting can likewise be formulated as (1), but in those 
cases A is an underdetermined matrix, i.e., we have m < n 
[9]. For all of these estimation scenarios, our goal is to obtain 
an estimate x whose MSE is as low as possible, where the 
MSE is defined as 

MS¥. = E{\\x-Xf)\\l] . (3) 

Note that Xq is deterministic, so that the expectation in (3) 
(and throughout the paper) is taken over the noise w but not 
over X[). Thus, the MSE is in general a function of Xq. 

In the above settings, the goal is to estimate the unknown 
signal Xq. However, it may also be of interest to recover the 
coefficient vector olq for which Xq = Dao, e.g., for the 
purpose of model selection [1], [4]. In this case, the goal is 
to construct an estimator a whose MSE — aoH^} is 

as low as possible. Unless D is unitary, estimating cxq is 
not equivalent to estimating Xo. Note, however, that when 
estimating cxq, the matrices A and D can be combined to 
obtain the equivalent problem 

y = Ha.Q + w (4) 

where H = AD is an m x p matrix and 

ao e T= {a e MP : i|a||o < s}. (5) 

Therefore, this problem can also be seen as a special case of 
(1) and (2). Nevertheless, it will occasionally be convenient to 
refer specifically to the problem of estimating otQ from (4). 
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Signal estimation problems differ in the properties of the 
dictionary D and measurement matrix A. In particular, prob- 
lems of a very different nature arise depending on whether 
the dictionary is a basis or an overcomplete frame. For 
example, many approaches to denoising yield simple shrinkage 
techniques when £) is a basis, but deteriorate to NP-hard 
optimization problems when D is overcomplete [21]. 

A final technical comment is in order If the matrix H in (4) 
does not have full column rank, then there may exist different 
feasible parameters cii and a.2 such that Hoti ~ Ha.2- In 
this case, the probability distribution of y will be identical for 
these two parameter vectors, and the estimation problem is said 
to be unidentifiable [22, §1.5.2]. A necessary and sufficient 
condition for identifiability is 



spark(J?) > 2s 



(6) 



where spark(H') is defined as the smallest integer k such that 
there exist k linearly dependent columns in H [23]. We will 
adopt the assumption (6) throughout the paper. Similarly, in 
the problem (1) we will assume that 



spark(Z)) > 2s. 



(7) 



B. Estimation Techniques 

We now review some standard estimators for the sparse 
problems described above. These techniques are usually 
viewed as methods for obtaining an estimate a of the vector 
a.Q in (4), and we will adopt this perspective in the current 
section. One way to estimate Xq in the more general problem 
(1) is to first estimate with the methods described below 
and then use the formula x = Da. 

A widely-used estimation technique is the ML approach, 
which provides an estimate of by solving 



\y - Ha\\l s.t. Hallo < s. 



(8) 



Unfortunately, (8) is a nonconvex optimization problem and 
solving it is NP-hard [21], meaning that an efficient algorithm 
providing the ML estimator is unlikely to exist. In fact, to the 
best of our knowledge, the most efficient method for solving 
(8) for general H is to enumerate the (^) possible s-element 
support sets of a and choose the one for which ||y — ffccHj is 
minimal. This is clearly an impractical strategy for reasonable 
values of p and s. Consequently, several efficient alternatives 
have been proposed for estimating otQ. One of these is the £i- 
penalty version of BPDN [1], which is defined as a solution 
Sbp to the quadratic program 



mini||y-ifQ!||^+7||Q!||i 



(9) 



with some regularization parameter 7. More recently, the DS 
was proposed [4]; this approach estimates cco as a solution 
Sds to 



minllalli sX. \\H'^ {y - Ha)\\ca < t 



(10) 



where t is again a user-selected parameter A modification of 
the DS, known as the Gauss-Dantzig selector (GDS) [4], is to 
use Sds only to estimate the support of ccq. In this approach. 



one solves (10) and determines the support set of Sds- The 
GDS estimate is then obtained as 



«GDS 



H'~^^y on the support set of Sds 
elsewhere 



(11) 



where iJans consists of the columns of H corresponding to 
the support of Sds- 

Previous research on the performance of these estimators 
has primarily examined their worst-case MSB among all 
possible values of a.Q E T. Specifically, it has been shown 
[4] that, under suitable conditions on H, s, and r, the DS of 
(10) satisfies 

||q:o — Sds II 2 — Csa^ logp with high probability (12) 

for some constant C. It follows that the MSB of the DS is 
also no greater than a constant times scr^ logp for all a.Q E T 
[12]. An identical property was also demonstrated for BPDN 
(9) with an appropriate choice of 7 [11]. Conversely, it is 
known that the worst-case error of any estimator is at least 
a constant times scr^logp [12, §7.4]. Thus, both BPDN and 
the DS are optimal, up to a constant, in terms of worst-case 
error Nevertheless, the MSB of these approaches for specific 
values of otQ, even for a vast majority of such values, might 
be much lower. Our goal differs from this line of work in that 
we characterize the pointwise performance of an estimator, 
i.e., the MSB for specific values of ao- 

Another baseline with which practical techniques are often 
compared is the oracle estimator, given by 



H]^^ b on the set supp(Q:o) 
elsewhere 



(13) 



where J?ao is the submatrix constructed from the columns 
of H corresponding to the nonzero entries of cco. In other 
words, Soracio is the least-squares (LS) solution among vectors 
whose support coincides with supp(ao), which is assumed to 
have been provided by an "oracle." Of course, in practice the 
support of ckq is unknown, so that Soracic cannot actually 
be implemented. Nevertheless, one often compares the perfor- 
mance of true estimators with Soracic, whose MSB is given 
by [4] 

(14) 



Is (14) a bound on estimation MSB? While Soracic is a 
reasonable technique to adopt if supp(Q:o) is known, this does 
not imply that (14) is a lower bound on the performance 
of practical estimators. Indeed, as will be demonstrated in 
Section V, when the SNR is low, both BPDN and the DS 
outperform Soracic, thanks to the use of shrinkage in these 
estimators. Furthermore, if supp(ao) is known, then there 
exist biased techniques which are better than Soracic for all 
values of ckq [24]. Thus, Soracic is neither achievable in 
practice, nor optimal in terms of MSB. As we will see, one 
can indeed interpret (14) as a lower bound on the achievable 
MSB, but such a result requires a certain restriction of the 
class of estimators under consideration. 
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III. The Constrained Cramer-Rao Bound 

A common technique for determining the achievable perfor- 
mance in a given estimation problem is to calculate the CRB, 
which is a lower bound on the MSE of estimators having a 
given bias [13]. In this paper, we are interested in calculating 
the CRB when it is known that the parameter x satisfies 
sparsity constraints such as those of the sets S of (2) and 
r of (5). 

The CRB for constrained parameter sets has been studied 
extensively in the past [15]-[18]. However, in prior work 
derivation of the CRB assumed that the constraint set is given 
by 

X = {x(^ M" : f{x) = 0, g{x) < 0} (15) 

where f{x) and g{x) are continuously differentiable func- 
tions. We will refer to such X as continuously differentiable 
sets. As shown in prior work [15], the resulting bound depends 
on the derivatives of the function /. Yet in some cases, in- 
cluding the sparse estimation scenarios discussed in Section II, 
the constraint set cannot be written in the form (15), and the 
aforementioned results are therefore inapplicable. Our goal 
in the current section is to close this gap by extending the 
constrained CRB to constraint sets X encompassing the sparse 
estimation scenario. 

We begin this section with a general discussion of the 
CRB and the class of estimators to which it applies. This 
will lead us to interpret the constrained CRB as a bound 
on estimators having an incompletely specified bias gradient. 
This interpretation will facilitate the application of the existing 
constrained CRB to the present context. 

A. Bias Requirements in the Constrained CRB 

In previous settings for which the constrained CRB was 
derived, it was noted that the resulting bound is typically 
lower than the unconstrained version [15, Remark 4]. At first 
glance, one would attribute the reduction in the value of the 
CRB to the fact that the constraints add information about 
the unknown parameter, which can then improve estimation 
performance. On the other hand, the CRB separately char- 
acterizes the achievable performance for each value of the 
unknown parameter Xq. Thus, the CRB at Xq applies even to 
estimators designed specifically to perform well at Xq- Such 
estimators surely cannot achieve further gain in performance 
if it is known that Xq G X. Why, then, is the constrained 
CRB lower than the unconstrained bound? The answer to this 
apparent paradox involves a careful definition of the class of 
estimators to which the bound applies. 

To obtain a meaningful bound, one must exclude some 
estimators from consideration. Unless this is done, the bound 
will be tarnished by estimators of the type x = x^, for some 
constant Xu, which achieve an MSE of at the specific point 
a; = Xu- It is standard practice to circumvent this difficulty 
by restricting attention to estimators having a particular bias 
b{x) = E{x} — X. In particular, it is common to examine 
unbiased estimators, for which b{x) ~ 0. 

However, in some settings, it is impossible to construct 
estimators which are unbiased for all x e M". For example. 



suppose we are to estimate the coefficients cxq of an over- 
complete dictionary based on the measurements given by (4). 
Since the dictionary is overcomplete, its nullspace is nontrivial; 
furthermore, each coefficient vector in the nullspace yields an 
identical distribution of the measurements, so that an estimator 
can be unbiased for one of these vectors at most. 

The question is whether it is possible to construct estimators 
which are unbiased for some, but not all, values of x. One 
possible approach is to seek estimators which are unbiased 
for all X G X. However, as we will see later in this section, 
even this requirement can be too strict: in some cases it is 
impossible to construct estimators which are unbiased for all 
X G X. More generally, the CRB is a local bound, meaning 
that it determines the achievable performance at a particular 
value of X based on the statistics at x and at nearby values. 
Thus, it is irrelevant to introduce requirements on estimation 
performance for parameters which are distant from the value 
X of interest. 

Since we seek a locally unbiased estimator, one possibility 
is to require unbiasedness at a single point, say x^- As it turns 
out, it is always possible to construct such a technique: this 
is again x = x^, which is unbiased at x^ but nowhere else. 
To avoid this loophole, one can require an estimator to be 
unbiased in the neighborhood 

Beixo) = {xeR"' ■.\\x-xoh<e} (16) 

of Xq, for some small e. It follows that both the bias b{x) and 
the bias gradient 

BW ^ £ (.7) 

vanish at a; = xq. This formulation is the basis of the 
unconstrained unbiased CRB, a lower bound on the covariance 
at Xq which applies to all estimators whose bias gradient is 
zero at a;o. 

It turns out that even this requirement is too stringent 
in constrained settings. As we will see in Section IV-A, 
estimators of the coefficients of an overcomplete dictionary 
must have a nonzero bias gradient matrix. The reason is related 
to the fact that unbiasedness is required over the set Be(xQ), 
which, in the overcomplete setting, has a higher dimension 
than the number of measurements. 

However, it can be argued that one is not truly interested 
in the bias at all points in Bg{xQ), since many of these 
points violate the constraint set X. A reasonable compromise 
is to require unbiasedness over Be{xQ) n X, i.e., over the 
neighborhood of xq restricted to the constraint set X. This 
leads to a weaker requirement on the bias gradient B at xq. 
Specifically, the derivatives of the bias need only be specified 
in directions which do not violate the constraints. The exact 
formulation of this requirement depends on the nature of the 
set X. In the following subsections, we will investigate various 
constraint sets and derive the corresponding requirements on 
the bias function. 

It is worth emphasizing that the dependence of the CRB on 
the constraints is manifested through the class of estimators 
being considered, or more specifically, through the allowed 
estimators' bias gradient matrices. By contrast, the uncon- 
strained CRB applies to estimators having a fully specified bias 



(a) (b) (c) 

Fig. 1 . In a locally balanced set such as a union of subspaces (a) and an open ball (b), each point is locally defined by a set of feasible directions along 
which an infinitesimal movement does not violate the constraints. The curve (c) is not characterized in this way and thus is not locally balanced. 



gradient matrix. Consequently, the constrained bound applies 
to a wider class of estimators, and is thus usually lower 
than the unconstrained version of the CRB. In other words, 
estimators which are unbiased in the constrained setting, and 
thus applicable to the unbiased constrained CRB, are likely to 
be biased in the unconstrained context. Since a wider class of 
estimators is considered by the constrained CRB, the resulting 
bound is lower, thus explaining the puzzling phenomenon 
described in the beginning of this subsection. 

B. Locally Balanced Constraints 

We now consider a class of constraint sets, called locally 
balanced sets, which encompass the sparsity constraints of 
Section II. Roughly speaking, a locally balanced set is one 
which is locally defined at each point by the directions along 
which one can move without leaving the set. Formally, a metric 
space X is said to be locally balanced if, for all x ^ X, there 
exists an open set C C X such that x E C and such that, for 
all x' E C and for all |A| < 1, we have 

x + X{x'-x)eC. (18) 

As we will see, locally balanced sets are useful in the context 
of the constrained CRB, as they allow us to identify the 
feasible directions along which the bias gradient must be 
specified. 

An example of a locally balanced set is given in Fig. 1(a), 
which represents a union of two subspaces. In Fig. 1(a), for any 
point X G X, and for any point x' £ X sufficiently close to x, 
the entire line segment between x and x', as well as the line 
segment in the opposite direction, are also in X. This illustrates 
the fact that any union of subspaces is locally balanced, and, 
in particular, so are the sparse estimation settings of Section II 
[25]-[27]. As another example, consider any open set, such as 
the open ball in Fig. 1(b). For such a set, any point x has a 
sufficiently small neighborhood C such that, for any x' e C, 
the line segment connecting x to x' is contained in X. On 
the other hand, the curve in Fig. 1(c) is not locally balanced, 
since the line connecting x to any other point on the set does 



not lie within the set. 

Observe that the neighborhood of a point a; in a locally 
balanced set X is entirely determined by the set of feasible 
directions v along which infinitesimal changes of x do not 
violate the constraints. These are the directions v ^ x' — x 
for all points x' ^ x in the set C of (18). Recall that we 
seek a lower bound on the performance of estimators whose 
bias gradient is defined over the neighborhood of Xq restricted 
to the constraint set X. Suppose for concreteness that we 
are interested in unbiased estimators. For a locally balanced 
constraint set X, this implies that 

Bv = (19) 

for any feasible direction v. In other words, all feasible 
directions must be in the nullspace of B. This is a weaker 
condition than requiring the bias gradient to equal zero, and 
is thus more useful for constrained estimation problems. If 
an estimator x satisfies (19) for all feasible directions v at 
a certain point Xq, we say that x is A'-unbiased at Xq. This 
terminology emphasizes the fact that A'-unbiasedness depends 
both on the point Xq and on the constraint set X. 

Consider the subspace spanned by the feasible directions 
at a certain point x E X. We refer to as the feasible 
subspace at x. Note that may include infeasible directions, if 
these are linear combinations of feasible directions. Neverthe- 
less, because of the linearity of (19), any vector u E T satisfies 
Bu — 0, even if u is infeasible. Thus, A'-unbiasedness is 
actually a property of the feasible subspace T, rather than the 
set of feasible directions. 

Since A" is a subset of a finite-dimensional Euclidean 
space, T is also finite-dimensional, although different points 
in X may yield subspaces having differing dimensions. Let 
Ui, . . . ,ui denote an orthonormal basis for !F, and define the 
matrix 

C/= [«!,...,«;]■ (20) 

'We note in passing that since the curve in Fig. 1(c) is continuously 
differentiable, it can be locally approximated by a locally balanced set. Our 
derivation of the CRB can be extended to such approximately locally balanced 
sets in a manner similar to that of [15], but such an extension is not necessary 
for the purposes of this paper. 
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Note that Ui and U are functions of x. For a given function 
X, different orthonormal bases can be chosen, but the choice 
of a basis is arbitrary and will not affect our results. 

As we have seen, A'-unbiasedness at Xq can alternatively 
be written as Bu = for all u G or, equivalently 



BU = 0. 



(21) 



The constrained CRB can now be derived as a lower bound 
on all A'-unbiased estimators, which is a weaker requirement 
than "ordinary" unbiasedness. 

Just as A'-unbiasedness was defined by requiring the bias 
gradient matrix to vanish when multiplied by any feasible di- 
rection vector, we can define A'-biased estimators by requiring 
a specific value (not necessarily zero) for the bias gradient 
matrix when multiplied by a feasible direction vector In an 
analogy to (21), this implies that one must define a value for 
the matrix BU. Our goal is thus to construct a lower bound 
on the covariance at a given x achievable by any estimator 
whose bias gradient B at x satisfies BU = P, for a given 
matrix P. This is referred to as specifying the A" -bias of the 
estimator at x. 

C. The CRB for Locally Balanced Constraints 

It is helpful at this point to compare our derivation with prior 
work on the constrained CRB, which considered continuously 
differentiable constraint sets of the form (15). It has been 
previously shown [15] that inequality constraints of the type 
g{x) < have no effect on the CRB. Consequently, we will 
consider constraints of the form 



X = {xe 



f{x) = 0}. 



(22) 



Define the k x n matrix F{x) = df/dx. For simplicity of 
notation, we will omit the dependence of F on x. Assuming 
that the constraints are non-redundant, is a full-rank matrix, 
and thus one can define an n x (n — k) matrix W (also 
dependent on x) such that 

FW = 0, W^W = I. (23) 

The matrix W is closely related to the matrix U spanning the 
feasible direction subspace of locally balanced sets. Indeed, the 
column space TZ{W) of W is the tangent space of A", i.e., 
the subspace of K." containing all vectors which are tangent 
to A" at the point x. Thus, the vectors in TZ{W) are precisely 
those directions along which infinitesimal motion from x does 
not violate the constraints, up to a first-order approximation. It 
follows that if a particular set X is both locally balanced and 
continuously differentiable, its matrices U and W coincide. 
Note, however, that there exist sets which are locally balanced 
but not continuously differentiable (and vice versa). 

With the above formulation, the CRB for continuously 
differentiable constraints can be stated as a function of the 
the matrix W and the bias gradient B [18]. In fact, the 
resulting bound depends on B only through BW. This is 
to be expected in light of the discussion of Section III-A: The 
bias should be specified only for those directions which do not 
violate the constraint set. Furthermore, the proof of the CRB 
in [18, Theorem 1] depends not on the formulation (22) of the 



constraint set, but merely on the class of bias functions under 
consideration. Consequently, one can state the bound without 
any reference to the underlying constraint set. To do so, let y 
be a measurement vector with pdf p{y; x), which is assumed 
to be differentiable with respect to x. The Fisher information 
matrix (FIM) J{x) is defined as 



where 



J{x) = £;{aA^} 
d\ogp{y;x) 



dx 



(24) 



(25) 



We assume that the FIM is well-defined and finite. We further 
assume that integration with respect to y and differentiation 
with respect to a; can be interchanged, a standard requirement 
for the CRB. We then have the following result. 

Theorem 1: Let x be an estimator and let B ~ db/dx 
denote the bias gradient matrix of a; at a given point Xq. Let 
U be an orthonormal matrix, and suppose that BU is known, 
but that B is otherwise arbitrary. If 



n{u{u + Buf)) c muu'^juu^) 

then the covariance of x at Xq satisfies 



(26) 



Cov(S) hiU + BU) [u'^JUj {U + BUf. (27) 
Equality is achieved in (27) if and only if 

X = Xq + b{xo) + {U + BU) (u^ ju\ U^ /i^ (28) 



in the mean square sense, where A is defined by (25). 
Conversely, if (26) does not hold, then there exists no finite- 
variance estimator with the required bias gradient. 

As required, no mention of constrained estimation is made 
in Theorem 1; instead, partial information about the bias 
gradient is assumed. Apart from this restatement, the theorem 
is identical to [18, Theorem 1], and its proof is unchanged. 
However, the above formulation is more general in that it can 
be applied to any constrained setting, once the constraints have 
been translated to bias gradient requirements. In particular. 
Theorem 1 provides a CRB for locally balanced sets if the 
matrix U is chosen as a basis for the feasible direction 
subspace of Section III-B. 

IV. Bounds on Sparse Estimation 

In this section, we apply the CRB of Theorem 1 to several 
sparse estimation scenarios. We begin with an analysis of the 
problem of estimating a sparse parameter vector. 

A. Estimating a Sparse Vector 

Suppose we would like to estimate a parameter vector ao- 
known to belong to the set T of (5), from measurements y 
given by (4). To determine the CRB in this setting, we begin by 
identifying the feasible subspaces T corresponding to each of 
the elements in T. To this end, consider first vectors a e T 
for which ||q;||o = s, i.e., vectors having maximal support. 
Denote by {ii, . . . , is] the support set of a. Then, for all 5, 
we have 



\a + 5e,A 



a 



s, 



1, 



(29) 
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where Cj is the jth column of the identity matrix. Thus a + 
6ei^ £ T, and consequently, the vectors {e^j , . . . , e^,} are 
all feasible directions, as is any linear combination of these 
vectors. On the other hand, for any j ^ supp(a) and for any 
nonzero S, we have \\a + 6ej\\o = s + 1, and thus ej is not a 
feasible direction; neither is any other vector which is not in 
spanjcij , . . . , }. It follows that the feasible subspace !F for 
points having maximal support is given by spanjcij , . . . , e^^ }, 
and a possible choice for the matrix U of (20) is 



U 



for jlallo 



(30) 



The situation is different for points a having ||q:||o < s. 
In this case, vectors corresponding to any direction i are 
feasible directions, since 



la + SeiWo < \\cx 



1 < s. 



(31) 



Because the feasible subspace is defined as the span of all 
feasible directions, we have 



It follows that T 
matrix U is 



T D span{ei, . . . , Cp} = W. (32) 
IP and thus a convenient choice for the 



U = I for llailn < s. 



(33) 



Consequently, whenever ||q:||o < s, a specification of the T- 
bias amounts to completely specifying the usual estimation 
bias b{x). 

To invoke Theorem 1, we must also determine the FIM 
J{a). Under our assumption of white Gaussian noise, J{a) 
is given by [13, p. 85] 



1 



J{a) = -^H'H. 



Using (30), (33), and (34), it is readily shown that 

U^JU={^^1'^- when !|a||„ = ,s 
1 -kjH^H when ||q!||o < s 



(34) 



(35) 



where Ha is the p x s matrix consisting of the columns of 
H indexed by supp(Q:). 

We now wish to determine under what conditions (26) 
holds. Consider first points ao for which i|Q;o||o = s- Since, 
by (6), we have spark(iJ) > s, it follows that in this case 



U JU is invertible. Therefore 

tT tttttT\ 



Since 



n{uu' juu' ) = n{uu' 



(36) 



(37) 



we have that condition (26) holds when ||q;o||o = s. 

The condition (26) is no longer guaranteed when ||q;o!|o < 
s. In this case, [/ = /, so that (26) is equivalent to 



7^(/ + B^) C TZiH^H). 



(38) 



Using the fact that TZ{H^ H) = 7^(J^^) and that, for any 
matrix Q, TZ{Q'^) = JV{Q)^, we find that (38) is equivalent 
to 

Af{H) CAf{I + B). (39) 



Combining these conclusions with Theorem 1 yields the 
following CRB for the problem of estimating a sparse vector 
Theorem 2: Consider the estimation problem (4) with cxq 
given by (5), and assume that (6) holds. For a finite-variance 
estimator a. of cto to exist, its bias gradient matrix B must 
satisfy (39) whenever ||q:o||o < s. Furthermore, the covariance 
of any estimator whose T-bias gradient matrix is BU satisfies 

Cov(5) h CT^(/ + B)(fl-^i?)t(7 + B^) 

when llaollo < s, 

Cov(a) ya\U + BU){HI^H ^.THU + BUf 

when ||q;o||o = s. (40) 

Here, H^a is the matrix containing the columns of H 
corresponding to supp(Q;o). 

Let us examine Theorem 2 separately in the underdeter- 
mined and well-determined cases. In the well-determined case, 
in which H has full row rank, the nullspace of H is trivial, 
so that (39) always holds. It follows that the CRB is always 
finite, in the sense that we cannot rule out the existence of 
an estimator having any given bias function. Some insight can 
be obtained in this case by examining the T-unbiased case. 
Noting also that H^H is invertible in the well-determined 
case, the bound for T-unbiased estimators is given by 



when ||q;o||o < s, 



Cov(5) y a^iH'^H)-^ 
Cov(S) h a^U{HlHa,r'U^ when ||ao||o = s 



(41) 

From this formulation, the behavior of the CRB can be 
described as follows. When ccq has non-maximal support 
(||q;o||o < s), the CRB is identical to the bound which 
would have been obtained had there been no constraints in 
the problem. This is because U = I in this case, so that T- 
unbiasedness and ordinary unbiasedness are equivalent. As we 
have seen in Section III-A, the CRB is a function of the class 
of estimators under consideration, so the unconstrained and 
constrained bounds are equivalent in this situation. The bound 
a"^ [H^ H)^^ is achieved by the unconstrained LS estimator 

a = {H^H)^H^y (42) 

which is the minimum variance unbiased estimator in the 
unconstrained case. Thus, we learn from Theorem 2 that for 
values of having non-maximal support, no T-unbiased 
technique can outperform the standard LS estimator, which 
does not assume any knowledge about the constraint set T. 

On the other hand, consider the case in which a.Q has 
maximal support, i.e., ||q!o||o = s. Suppose first that supp(Q;o) 
is known, so that one must estimate only the nonzero values 
of olq. In this case, a reasonable approach is to use the 
oracle estimator (13), whose covariance matrix is given by 
(7^U{H'^^^Ha„)^^U'^ [4]. Thus, when has maximal 
support. Theorem 2 states that T-unbiased estimators can 
perform, at best, as well as the oracle estimator, which is 
equivalent to the LS approach when the support of cxq is 
known. 

The situation is similar, but somewhat more involved, in 
the underdetermined case. Here, the condition (39) for the 
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existence of an estimator having a given bias gradient matrix 
no longer automatically holds. To interpret this condition, it is 
helpful to introduce the mean gradient matrix M{a), defined 



M(a) 



dE{a} 
da 



^I + B. 



(43) 



The matrix iW(a) is a measure of the sensitivity of an 
estimator to changes in the parameter vector For example, 
a T-unbiased estimator is sensitive to any feasible change in 
a. Thus, M{M) denotes the subspace of directions to which 
S is insensitive. Likewise, M{H) is the subspace of directions 
for which a change in a does not modify Hey.. The condition 
(39) therefore states that for an estimator to exist, it must be 
insensitive to changes in a which are unobservable through 
Ha, at least when ||Q:j|o < s. No such requirement is imposed 
in the case j|Q;||o = s, since in this case there are far fewer 
feasible directions. 

The lower bound (40) is similarly a consequence of the 
wide range of feasible directions obtained when ||q:||o < s, as 
opposed to the tight constraints when j|Q;|lo = s- Specifically, 
when II cello < s, a change to any component of a is feasible 
and hence the lower bound equals that of an unconstrained 
estimation problem, with the FIM given by a~^H^H. On 
the other hand, when ||q:||o = s, the bound is effectively that 
of an estimator with knowledge of the particular subspace to 
which a belongs; for this subspace the FIM is the submatrix 
given in (35). This phenomenon is discussed further 
in Section VI. 

Another difference between the well-determined and un- 
derdetermined cases is that when H is underdetermined, an 
estimator cannot be T-unbiased for all a. To see this, recall 
from (21) that T-unbiased estimators are defined by the fact 
that BU = 0. When ||q!||o < s, we have U ^ I and 
thus T-unbiasedness implies B = 0, so that Af{I + B) = 
{0}. But since H is underdetermined, M{H) is nontrivial. 
Consequently, (39) cannot hold for T-unbiased estimators 
when ||q;||o < s. 

The lack of T-unbiased estimators when ||ao||o < s is 
a direct consequence of the fact that the feasible direction 
set at such a^ contains all of the directions ei, . . . , e^. The 
conclusion from Theorem 2 is then that no estimator can be 
expected to be unbiased in such a high-dimensional neigh- 
borhood, just as unbiased estimation is impossible in the p- 
dimensional neighborhood ^^(ao), as explained in Section III- 
A. However, it is still possible to obtain a finite CRB in this 
setting by further restricting the constraint set: if it is known 
that ||ao||o = s < s, then one can redefine T in (5) by 
replacing s with s. This will enlarge the class of estimators 
considered T-unbiased, and Theorem 2 would then provide a 
finite lower bound on those estimators. Such estimators will 
not, however, be unbiased in the sense implied by the original 
constraint set. 

While an estimator cannot be unbiased for all a G T, 
unbiasedness is possible at points a for which ||q:||o = s. 
In this case. Theorem 2 produces a bound on the MSB of a 
T-unbiased estimator, obtained by calculating the trace of (40) 



in the case BU = 0. This bound is given by 

E{\\a-ao\\l}><7-'TT{{Hl^H^„)-'), ||ao||o = -s. 

(44) 

The most striking feature of (44) is that it is identical 
to the oracle MSB (14). However, the CRB is of additional 
importance because of the fact that the ML estimator achieves 
the CRB in the limit when a large number of independent 
measurements are available, a situation which is equivalent in 
our setting to the limit cr ^ 0. In other words, an MSB of (44) 
is achieved at high SNR by the ML approach (8), as we will 
illustrate numerically in Section V. While the ML approach 
is computationally intractable in the sparse estimation setting, 
it is still implementable in principle, as opposed to Soiacic, 
which relies on unavailable information (namely, the support 
set of ao). Thus, Theorem 1 gives an alternative interpretation 
to comparisons of estimator performance with the oracle. 

Observe that the bound (44) depends on the value of ag 
(through its support set, which defines Hag)- This implies 
that some values of ao are more difficult to estimate than 
others. For example, suppose the £2 norms of some of the 
columns of H are significantly larger than the remaining 
columns. Measurements of a parameter ao whose support 
corresponds to the large-norm columns of H will then have 
a much higher SNR than measurements of a parameter corre- 
sponding to small-norm columns, and this will clearly affect 
the accuracy with which a^ can be estimated. To analyze 
the behavior beyond this effect, it is common to consider the 
situation in which the columns hi of H are normalized so that 
II /^i II 2 = 1- In this case, for sufficiently incoherent dictionaries, 
Tt{{H'^^.^H oif,)^^) is bounded above and below by a small 
constant times s, so that the CRB is similar for all values of 
CKo. To see this, let ^ be the coherence of H [1], defined (for 
H having normalized columns) as 



II = max 



hfh. 



(45) 



T TT 



By the Gershgorin disc theorem, the eigenvalues of H 
are in the range [1 — sn, 1 + sfi]. It follows that the unbiased 
CRB (44) is bounded above and below by 



S(7 



<a'TT{{Hl^H^„r')< 



(46) 



Thus, when s is somewhat smaller than the CRB is 

roughly equal to sa^ for all values of ao. As we have 
seen in Section II-B, for sufficiently small s, the worst-case 
MSB of practical estimators, such as BPDN and the DS, is 
0{s(T^ \ogp). Thus, practical estimators come almost within 
a constant of the unbiased CRB, implying that they are close 
to optimal for all values of ao, at least when compared with 
unbiased techniques. 

B. Denoising and Deblurring 

We next consider the problem (1), in which it is required to 
estimate not the sparse vector ao itself, but rather the vector 
Xo = Dao, where D is a known dictionary matrix. Thus, Xo 
belongs to the set S of (2). We assume for concreteness that D 
has full row rank and that A has full column rank. This setting 



9 



encompasses the denoising and deblurring problems described 
in Section II- A, with the former arising when A = I and the 
latter obtained when A represents a blurring kernel. Similar 
calculations can be carried out when A is rank-deficient, a 
situation which occurs, for example, in some interpolation 
problems. 

Recall from Section II-A the assumption that every x E S 
has a unique representation x = Da for which a is in the 
set T of (5). We denote by r( ) the mapping from 5 to T 
which returns this representation. In other words, r{x) is the 
unique vector in T for which 

x = Dr{x) and |lr(£c)|lo < s. (47) 

Note that while the mapping r is well-defined, actually cal- 
culating the value of r{x) for a given vector x is, in general, 
NP-hard. 

In the current setting, unlike the scenario of Section IV- 
A, it is always possible to construct an unbiased estimator 
Indeed, even without imposing the constraint (2), there exists 
an unbiased estimator This is the LS or maximum likelihood 
estimator, given by 

X = {A^A)-^A^y. (48) 

A standard calculation demonstrates that the covariance of x 
is 

G^{A^A)-^. (49) 
On the other hand, the FIM for the setting (1) is given by 

J = \a^A. (50) 

Since A has full row rank, the FIM is invertible. Consequently, 
it is seen from (49) and (50) that the LS approach achieves the 
CRB J^^ for unbiased estimators. This well-known property 
demonstrates that in the unconstrained setting, the LS tech- 
nique is optimal among all unbiased estimators. 

The LS estimator, like any unbiased approach, is also 5- 
unbiased. However, with the addition of the constraint Xq e 
S, one would expect to obtain improved performance. It is 
therefore of interest to obtain the CRB for the constrained 
setting. To this end, we first note that since J is invertible, 
we have 'R{UU'^ JUU^) = 'R{UU'^) for any U, and 
consequently (26) holds for any matrix B. The bound (27) of 
Theorem 1 thus applies regardless of the bias gradient matrix. 

For simplicity, in the following we derive the CRB for 5- 
unbiased estimators. A calculation for arbitrary 5-bias func- 
tions can be performed along similar lines. Consider first 
values X E S such that ||r(a:)||o < s. Then, ||r(a;)+(5e.i||o < s 
for any 5 and for any e;. Therefore, 

x + 5Dei(:,S (51) 

for any 6 and e^. In other words, the feasible directions include 
all columns of D. Since it is assumed that D has full row rank, 
this implies that the feasible subspace T equals M", and the 
matrix U of (20) can be chosen as C7 = /. 

Next, consider values x G S for which ||r(a;)||o = s. Then, 
for sufficiently small (5 > 0, we have |lr(a;) + (5t;||o < s if and 



only if V = e; for some i E supp(r(a;)). Equivalently, 

X + Sv E S if and only if v = Dei and i E supp(r(a;)). 

(52) 

Consequently, the feasible direction subspace in this case 
corresponds to the column space of the matrix D^ containing 
the s columns of D indexed by supp(r(a;)). From (7) we 
have spark(Z)) > s, and therefore the columns of D^ are 
linearly independent. Thus the orthogonal projector onto T is 
given by 

P^UU^ ^D^{DlD^)^Dl. (53) 

Combining these calculations with Theorem 1 yields the 
following result. 

Theorem 3: Consider the estimation setting (1) with the 
constraint (2), and suppose spark(Z)) > 2s. Let x he a finite- 
variance iS-unbiased estimator Then, 

Cov(S) >z a^iA'^A)-^ when ||r(a;)||o < s, 

Cov{x)t (J^ {^PA'^APy when ||r(a;)l|o = s. (54) 

Here, P is given by (53), in which D^ is the n x s matrix 
consisting of the columns of D participating in the (unique) 
s-element representation Da of x. 

As in Theorem 2, the bound exhibits a dichotomy between 
points having maximal and non-maximal support. In the 
former case, the CRB is equivalent to the bound obtained when 
the support set is known, whereas in the latter the bound is 
equivalent to an unconstrained CRB. This point is discussed 
further in Section VI. 

V. Numerical Results 

In this section, we demonstrate the use of the CRB for mea- 
suring the achievable MSB in the sparse estimation problem 
(4). To this end, a series of simulations was performed. In each 
simulation, a random 100 x 200 dictionary H was constructed 
from a zero-mean Gaussian IID distribution, whose columns 
hi were normalized so that ||ft.i||2 = 1. A parameter ao was 
then selected by choosing a support uniformly at random and 
selecting the nonzero elements as Gaussian IID variables with 
mean and variance 1. Noisy measurements y were obtained 
from (4), and ao was then estimated using BPDN (9), the DS 
(10), and the GDS (11). The regularization parameters were 
chosen as r = 2ay/\ogp and 7 = 4ay^\og{p — s), rules of 
thumb which are motivated by a theoretical analysis [11]. The 
MSB of each estimate was then calculated by repeating this 
process with different realizations of the random variables. 
The unbiased CRB was calculated using (44). In this case, the 
unbiased CRB equals the MSB of the oracle estimator (13), but 
as we will see below, interpreting (44) as a bound on unbiased 
estimators provides further insight into the estimation problem. 

A first set of experiments was conducted to examine the 
CRB at various SNR levels. In this simulation, the ML 
estimator (8) was also computed, in order to verify its con- 
vergence to the CRB at high SNR. Since the ML approach 
is computationally prohibitive when p and s are large, this 
necessitated the selection of the rather low support size s = 3. 
The MSB and CRB were calculated for 15 SNR values by 
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Fig. 2. MSB of various estimators compai'ed with tlie unbiased CRB (44), for (a) varying SNR and (b) varying sparsity levels. 



changing the noise standard deviation a between 1 and 10~ . 
The MSE of the ML approach, as well as the other estimators 
of Section II-B, is compared with the CRB in Fig. 2(a). 
The convergence of the ML estimator to the CRB is clearly 
visible in this figure. The performance of the GDS is also 
impressive, being as good or better than the ML approach. 
Apparently, at high SNR, the DS tends to correctly recover the 
true support set, in which case GDS (11) equals the oracle (13). 
Perhaps surprisingly, applying a LS estimate on the support set 
obtained by BPDN (which could be called a "Gauss-BPDN" 
strategy) does not work well at all, and in fact results in higher 
MSE than a direct apphcation of BPDN. (The results for the 
Gauss-BPDN method are not plotted in Fig. 2.) 

Note that some estimation techniques outperform the oracle 
MSE (or CRB) at low SNR. It may appear surprising that 
a practical technique such as the DS outperforms the oracle. 
The explanation for this stems from the fact that the CRB 
(44) is a lower bound on the MSE of unbiased estimators. 
The bias of most estimators tends to be negligible in low- 
noise settings, but often increases with the noise variance . 
Indeed, when cP' is as large as HaoHij the measurements carry 
very little useful information about ao, and an estimator can 
improve performance by shrinkage. Such a strategy, while 
clearly biased, yields lower MSE than a naive reliance on the 
noisy measurements. This is indeed the behavior of the DS and 
BPDN, since for large cr^, the l\ regularization becomes the 
dominant term, resulting in heavy shrinkage. Consequently, it 
is to be expected that such techniques will outperform even 
the best unbiased estimator at low SNR, as indeed occurs in 
Fig. 2(a). 

The performance of the estimators of Section II-B, exclud- 
ing the ML method, was also compared for varying sparsity 
levels. To this end, the simulation was repeated for 15 support 
sizes in the range 1 < s < 30, with a constant noise 
standard deviation of ct = 0.01. The results are plotted in 
Fig. 2(b). While a substantial gap exists between the CRB and 
the MSE of the practical estimators in this case, the general 
trend in both cases describes a similar rate of increase as 
s grows. Interestingly, a drawback of the GDS approach is 



visible in this setting: as s increases, correct support recovery 
becomes more difficult, and shrinkage becomes a valuable 
asset for reducing the sensitivity of the estimate to random 
measurement fluctuations. The LS approach practiced by the 
GDS, which does not perform shrinkage, leads to gradual 
performance deterioration. 

Results similar to Fig. 2 were obtained for a variety of 
related estimation scenarios, including several deterministic, 
rather than random, dictionaries H.. 

VI. Discussion 

In this paper, we extended the CRB to constraint sets sat- 
isfying the local balance condition (Theorem 1). This enabled 
us to derive lower bounds on the achievable performance in 
various estimation problems (Theorems 2 and 3). In simple 
terms. Theorems 2 and 3 can be summarized as follows. 
The behavior of the CRB differs depending on whether or 
not the parameter has maximal support (i.e., ||q:||o = In 
the case of maximal support, the bound equals that which 
would be obtained if the sparsity pattern were known; this 
can be considered an "oracle bound". On the other hand, 
when II alio < s, performance is identical to the unconstrained 
case, and the bound is substantially higher We now discuss 
some practical implications of these conclusions. To simplify 
the discussion, we consider the case of unbiased estimators, 
though analogous conclusions can be drawn for any bias 
function. 

When ||a||o = s and all nonzero elements of ol are 
considerably larger than the standard deviation of the noise, 
the support set can be recovered correctly with high probability 
(at least if computational considerations are ignored). Thus, in 
this case an estimator can mimic the behavior of the oracle, 
and the CRB is expected to be tight. Indeed, in the high 
SNR limit, the ML estimator achieves the unbiased CRB. On 
the other hand, when the support of ol is not maximal, the 
unbiasedness requirement demands sensitivity to changes in all 
components of ol, and consequently the bound coincides with 
the unconstrained CRB. Thus, as claimed in Section III, in 
underdetermined cases no estimator is unbiased for all cc e 5. 
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An interesting observation can also be made concerning 
maximal-support points a for which some of the nonzero 
elements are close to zero. The CRB in this "low-SNR" case 
corresponds to the oracle MSE, but as we will see, the bound 
is loose for such values of a. Intuitively, at low-SNR points, 
any attempt to recover the sparsity pattern will occasionally 
fail. Consequently, despite the optimistic CRB, it is unlikely 
that the oracle MSE can be achieved. Indeed, the covariance 
matrix of any finite-variance estimator is a continuous function 
of a. [22], and the fact that performance is bounded by the 
(much higher) unconstrained bound when ||q;||o < s implies 
that performance must be similarly poor for low SNR. 

This excessive optimism is a result of the local nature of 
the CRB: The bound is a function of the estimation setting 
only in an e-neighborhood of the parameter itself. Indeed, the 
CRB depends on the constraint set only through the feasible 
directions, which were defined in Section III-B as those 
directions which do not violate the constraints for sufficiently 
small deviations. Thus, for the CRB, it is entirely irrelevant 
if some of the components of a. are close to zero, as long as 
supp(q!) is held constant. 

A tighter bound for sparse estimation problems may be 
obtained using the Hammersley-Chapman-Robbins (HCR) 
approach [15], [28], [29], which depends on the constraints 
at points beyond the local neighborhood of x. Such a bound 
is likely to yield tighter results for low SNR values, and will 
create a smooth transition between the regions of maximal 
and non-maximal support. However, the bound will depend 
on more complex properties of the estimation setting, such as 
the distance between Da. and feasible points with differing 
supports. The derivation of such a bound is a subject for further 
research. 
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